2022: A new beginning

Goals for 2022

lruolin
01-26-2022

2021: Recap

I first started taking R lessons at SMU in 2020, and took 6 modules on data analytics from Feb - Jun. Following which, I took another 3 modules from Nov - Jan 2021. When my lessons ended, I was thinking, what should I do, so that I will be able to practice coding and not forget what I learnt? Hence, I started to experiment with the distill package, and managed to get this blog started… it had been a year already!

2021 was a year of learning… I tried to go back to basics and familiarize myself with the R4DS book. I applied more of what I learnt at work, and some happy moments were writing a simple package that had the company colors which I could use for making visualizations, dipping my fingers into creating more functions and using curly braces, using map on my own functions, trying out plotly to make interactive visualizations that helped me a lot in my work in terms of labeling GC peaks by their Kovats Index, trying out PCA on larger datasets to get more sense of my data.. I also used the datatable DT package quite often when handling my tabular data, which had nice formatting and filter functions. Other than that, I tried out PCA on some of my data, and followed the workflow by Julia Silge here. Optical Character Recognition (OCR) was something I wish I could look into more, it is a good way to mine text data, especially when I have a lot of information from books and trade journals, and rather than manually keying in into excel, I wish for a magic wand to magically format all the information I need neatly into tables. All these were for ad-hoc projects at work, and I really wish I did it on a more frequent basis, so that I can know the codes at the back of my head. At the moment, I still need to jot it down somewhere and refer to my notes to carry out the same workflow.

TidyTuesday has a wealth of data for learners like me, and I was practicing alongside David Robinson’s videos as well…

If the first half of the year was about consolidating what I learnt about tidyverse, the second half of the year was spent understanding tidymodels. I’m really glad to have discovered Julia Silge’s blog, and can learn from her screencasts. She is an inspiration to me, I love her visualizations and clear explanations, and learnt so much from her short but concise videos. I only took 3 modules on tidymodels, to understand the flavor of machine learning. Our prof gave us some useful textbooks to read, and the second half of the year was really about practicing, and reading up more about machine learning. However, I was doing more of classification and regression, the supervised machine learning models. There is still a lot to learn for unsupervised learning, as well as using tidymodels on text data, which I find fascinating, but have not had the time to explore.

One thing I realised is that if I really want to go into data science, I would also need to learn python. I really love R.. as an analogy, it is like a mother tongue to me (I speak Mandarin at home and feel more at ease with it), but the working language is English (which is like python). Almost every single job ad required some working knowledge of Python…

I went “missing” for the month of December and also the first three weeks of January. This was because I considered signing up for AIAP by AI Singapore, and was contemplating a mid-career switch. I wanted to do more of coding for my day to day work, and considered a few options, such as signing up for the Skillsfuture mid-career courses which gave an allowance for taking classes and internships for 6 months. I also considered taking coding bootcamps to further upgrade… but was caught between resigning, or studying part time at the expense of family time.

So, between the months of Nov - Jan, I was really busy trying to translate what I learnt in R, into python. There were many days of waking up at 530 am, and lunch breaks spent on making notes, in order to prepare myself for a technical assignment that will be given to us to work on for five days in Jan. In fact, I think it had irreversibly changed my waking habits, as I often wake up before 5am for no reason nowadays…

Knowing R helped a bit in learning python, because I roughly knew what are the things that could be done, and just had to find out how to do it in python. However, the syntax is a little different, so I had to familiarize myself with it. I wrote so many cheatsheets in the form of notebooks, experimented with jupyter notebook, jupyterlab, spyder, bash scripting, creating virtual environments, understanding the proper workflow, figuring out how to run machine learning models, using pipelines for machine learning preprocessing and modelling….I was also unaware of the standard practices or good practices in python, unlike in R where our prof shared with us on all the industry practices that you cannot find in textbooks..

The technical assignment was finally submitted on, and I really felt like I took a major exam, with a pile of notes on my messy table. When I submitted and saw the progress bar going to 100%, I went to hug my daughter tightly. I am really thankful to my husband for helping me with the family front over the last two months, and my daughter for giving me some time off to work on this project.

The outcome will only be out in Feb. I don’t think I did well on the whole, but my initial goal was to take the Jan paper, understand the gaps I had in my learning, and work on it as well as keep it as an option for switching field in 2022, or retaking the technical assessment at the end of the year. Only 10% of the people who register for AIAP eventually make it, so the odds are really against me, a newbie to Python, as compared to many of the people who already took MITB with SMU…I also find myself more of a data analyst/data scientist kind of person, rather than a data engineer/data architect kind of person.. the second part of the assessment really stretched my limits!

But I am really proud of myself for completing the part of the assignment in EDA. Imagine learning everything I learnt in R, in python, by myself, in just 2 months…

2022: Goals

2022 is going to be an exciting year, judging from the events that had already happened in the first three weeks. My goals for this year are:

I took a week of break after submitting the assignment for AIAP, but it’s now back to regular programming!

ML Workflow

This is one of the key takeaways that I had from learning about ML workflow last month. I was always wondering, what should I look out for during EDA? Let me just record down the steps here, for my future me. The text below is adapted from Appendix 2 from the book by [Jason Brownlee] (https://machinelearningmastery.com/data-preparation-for-machine-learning/)

1. Understanding the context

2. Import data

3. Exploratory Data Analysis

4. Data Wrangling

6. Splitting data

7. Preprocessing steps for training dataset

8. Model selection

6. Model optimization

7. Communicate the findings

8. Deployment

Useful books (Python)

Citation

For attribution, please cite this work as

lruolin (2022, Jan. 26). pRactice corner: 2022: A new beginning. Retrieved from https://lruolin.github.io/myBlog/posts/20220125 - The year ahead/

BibTeX citation

@misc{lruolin20222022:,
  author = {lruolin, },
  title = {pRactice corner: 2022: A new beginning},
  url = {https://lruolin.github.io/myBlog/posts/20220125 - The year ahead/},
  year = {2022}
}